Overprvning Large Decision Trees
نویسنده
چکیده
This paper presents empirical evidence for five hypotheses about learning from large noisy domains: that trees built from very large training sets are larger and more accurate than trees built from even large subsets; that this increased accuracy is only in part due to the extra size of the trees; and that the extra training instances allow both better choices of attribute while building the tree, and better choices of the subtrees to prune after it has been built. For the practitioner with the common goals of maximising the accuracy and minimising the size of induced trees, these conclusions prompt new techniques for induction on large training sets. Although building huge trees from huge training sets is computationally expensive, pruning smaller trees on them is not, yet it improves accuracy. Where a pruned tree is considered too large for human or machine limitations, it can be overpruncd to an acceptable size. Although this requires far more time than building a tree of that size from a correspondingly small training set, it wi l l usually be more accurate. The paper also describes an algorithm for overpruning trees to user-specified size limits; it is evaluated in the course of testing the above hypotheses.
منابع مشابه
Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...
متن کاملA New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملPredicting The Type of Malaria Using Classification and Regression Decision Trees
Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...
متن کاملA visualization tool for interactive learning of large decision trees
Decision tree induction is certainly among the most applicable learning techniques due to its power and simplicity. Howevel; learning decision trees from large datasets, particularly in data mining, is quite different from learning from small or moderately sized datasets. When learning from large datasets, decision tree induction programs often produce very large trees. How to visualize efficie...
متن کاملCompiling large-context phonetic decision trees into finite-state transducers
Recent work has shown that the use of finite-state transducers (FST’s) has many advantages in large vocabulary speech recognition. Most past work has focused on the use of triphone phonetic decision trees. However, numerous applications use decision trees that condition on wider contexts; for example, many systems at IBM use 11-phone phonetic decision trees. Alas, large-context phonetic decisio...
متن کامل